Skip to content

REF: Base class for all extension tests #19863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 24, 2018

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Feb 23, 2018

Added the staticmethods

  • assert_series_equal
  • assert_frame_equal

to the base class. Useful for test cases like DecimalArray with NaNs,
since our assert_*_equal methods don't properly handle Decimal NaNs.

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Added the staticmethods

- assert_series_equal
- assert_frame_equal

to the base class. Useful for test cases like DecimalArray with NaNs,
since our assert_*_equal methods don't properly handle Decimal NaNs.
@TomAugspurger TomAugspurger added Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite Internals Related to non-user accessible pandas implementation labels Feb 23, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Feb 23, 2018
@TomAugspurger TomAugspurger mentioned this pull request Feb 23, 2018
15 tasks
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we advertising this kind of api for testing? when we always use tm.assert_*? this is too easy to get wrong. wouldn't it be better to use that as the main one (e.g. make them EA aware)?

@@ -4,8 +4,10 @@
import pandas.util.testing as tm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we don't / shouldn't import tm in any of the subclasses of EA for testing (otherwise easy to accidently NOT use self.assert_*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still used for things like assert_raises_regex. But maybe we could import them explicitly (from pandas.util.testing import ...

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me


All the tests in these modules use ``self.assert_frame_equal`` or
``self.assert_series_equal`` for dataframe or series comparisons. By default,
they use the usual ``pandas.util.testing.assert_frame_equal`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.util.testing.assert_frame_equal -> pandas.testing.assert_frame_equal (that is the public API path)

@@ -4,8 +4,10 @@
import pandas.util.testing as tm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still used for things like assert_raises_regex. But maybe we could import them explicitly (from pandas.util.testing import ...

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Feb 23, 2018

wouldn't it be better to use that as the main one (e.g. make them EA aware)?

I thought about doing that for the test cases we've written. Decimal('NaN') is the problem I've hit since we don't compare NaNs "correctly". But I don't think it'd be worthwhile putting in fixes for those specific ones since,

1.) It's only solving the problem for our specific functions. 3rd party libraries writing EAs would have no way to fix it.
2.) It's complicating our assert_*_equal methods for very little gain.

(otherwise easy to accidently NOT use self.assert_*

Yes, this is going to be a problem going forward. I'm not sure how best to avoid it, but avoiding a tm import is a good start.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Feb 23, 2018

I could also add a linting rule that looks for tm.assert* in these submodules. Thoughts?

Edit: ahh that may be hard since in a followup I define assert_*_equal for DecimalArray, but it uses tm.assert_*_ internally.

@jorisvandenbossche
Copy link
Member

why are we advertising this kind of api for testing? when we always use tm.assert_*? this is too easy to get wrong. wouldn't it be better to use that as the main one (e.g. make them EA aware)?

It's only for testing extension arrays (so for developers), not public API for users.
The problem is that we cannot know in general how an extension array will be laid out internally, so we cannot make our test methods fully aware of them.

Although, one alternative would be to define a equals method on ExtensionArray that deals with NaNs.

@TomAugspurger
Copy link
Contributor Author

Although, one alternative would be to define a equals method on ExtensionArray that deals with NaNs.

And then just call .equals on it in assert_series_equal if they're an extension dtype? I could get behind that.

@jorisvandenbossche
Copy link
Member

I am only not sure I am fully convinced myself :). The problem with and .equals method like that, it that it does not leave the space for an element-wise equals method. But of course with Series.equals already there, it might also be confusing that add such a method as an extension author.

@TomAugspurger
Copy link
Contributor Author

does not leave the space for an element-wise equals method

I think following the lead of Categorical is best here. .equals for returning True/False and .__eq__ for elementwise.

That said, .equals isn't great for the assert_*_equal methods since we want an informative traceback.

I should be able to add a linting rule that looks for tm.assert_*_equal only the the extension/base module. Then we're free to use it in extension/decimal.py

@codecov
Copy link

codecov bot commented Feb 23, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@01e99de). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19863   +/-   ##
=========================================
  Coverage          ?   91.64%           
=========================================
  Files             ?      150           
  Lines             ?    48946           
  Branches          ?        0           
=========================================
  Hits              ?    44858           
  Misses            ?     4088           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.02% <ø> (?)
#single 41.81% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01e99de...e0164a4. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

OK, this should now fail if tests in extension/base/*.py use tm.assert_(frame|series)_equal. I've excluded base/base.py since that's where the defaults are set.

@jorisvandenbossche
Copy link
Member

I think following the lead of Categorical is best here. .equals for returning True/False and .eq for elementwise.

But, __eq__ has no flexibilty regarding NaNs, so that's a reason one would want a element-wise equals (and from that, you can always do a .all() to get a single boolean). Anyhow, the existing behaviour is of course there, but if I would design it from scratch, not sure if I would do equals as it is now.

@jorisvandenbossche
Copy link
Member

But for this PR, I would personally for now do like you did here. We can later still discuss if we want an equals method or not.

@jreback jreback merged commit e362281 into pandas-dev:master Feb 24, 2018
@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

thanks!

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
@TomAugspurger TomAugspurger deleted the fu1+base-test-class branch May 2, 2018 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants